Skip to content

feat: add Qwen3-Omni Thinker GSPO support#6238

Draft
qinganrice wants to merge 4 commits intoverl-project:mainfrom
qinganrice:qwen3-omni-thinker-v2
Draft

feat: add Qwen3-Omni Thinker GSPO support#6238
qinganrice wants to merge 4 commits intoverl-project:mainfrom
qinganrice:qwen3-omni-thinker-v2

Conversation

@qinganrice
Copy link
Copy Markdown

Summary

  • Register Qwen3-Omni model in AutoModelForCausalLM with forward redirect to Thinker, fix tie_word_embeddings and _no_split_modules for FSDP compatibility
  • Fix FSDP LoRA deadlock: skip lambda wrap policy when min_num_params > 0 to avoid nested FSDP allgather divergence
  • Cast LoRA params to base model dtype after get_peft_model so FSDP can flatten mixed-dtype units
  • Strip unused sub-modules (Talker/Code2Wav) after from_pretrained via _verl_strip_modules
  • Add Thinker layer prefixes to layered_summon with fallback to full summon when layered returns empty
  • Fix text_config fallback in monkey_patch for models without top-level num_attention_heads
  • Duck-typing fix for vLLM LoRA request to support vllm-omni's LoRARequest
  • Add gsm8k_thinker reward with </think> extraction and \boxed{} support
  • Register vllm_omni / vllm_omni_ar in rollout and replica registries for verl-omni integration

Test plan

  • End-to-end GSPO LoRA training with Qwen3-Omni thinker model

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 4, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the Qwen3-Omni model architecture and enhances FSDP and LoRA handling. Key changes include registering the Qwen3-Omni Thinker as a causal language model with custom forward and embedding logic, implementing a module stripping mechanism to reduce memory usage during FSDP initialization, and adding a new reward scoring utility (gsm8k_thinker) designed for models that output reasoning steps. Additionally, the PR updates LoRA parameter collection to support diffusers and adds a fallback mechanism for parameter summoning. Review feedback highlights the need to narrow broad architecture mappings to prevent conflicts with encoder-decoder models, improve exception handling during model registration, refine regex patterns in the reward scorer to handle currency symbols, and remove debug print statements from production code.

Comment thread verl/utils/model.py Outdated
Comment thread verl/utils/model.py Outdated
Comment thread verl/utils/reward_score/gsm8k_thinker.py Outdated
Comment thread verl/workers/engine/fsdp/transformer_impl.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants